53 research outputs found

    Mining time-series data using discriminative subsequences

    Get PDF
    Time-series data is abundant, and must be analysed to extract usable knowledge. Local-shape-based methods offer improved performance for many problems, and a comprehensible method of understanding both data and models. For time-series classification, we transform the data into a local-shape space using a shapelet transform. A shapelet is a time-series subsequence that is discriminative of the class of the original series. We use a heterogeneous ensemble classifier on the transformed data. The accuracy of our method is significantly better than the time-series classification benchmark (1-nearest-neighbour with dynamic time-warping distance), and significantly better than the previous best shapelet-based classifiers. We use two methods to increase interpretability: First, we cluster the shapelets using a novel, parameterless clustering method based on Minimum Description Length, reducing dimensionality and removing duplicate shapelets. Second, we transform the shapelet data into binary data reflecting the presence or absence of particular shapelets, a representation that is straightforward to interpret and understand. We supplement the ensemble classifier with partial classifocation. We generate rule sets on the binary-shapelet data, improving performance on certain classes, and revealing the relationship between the shapelets and the class label. To aid interpretability, we use a novel algorithm, BruteSuppression, that can substantially reduce the size of a rule set without negatively affecting performance, leading to a more compact, comprehensible model. Finally, we propose three novel algorithms for unsupervised mining of approximately repeated patterns in time-series data, testing their performance in terms of speed and accuracy on synthetic data, and on a real-world electricity-consumption device-disambiguation problem. We show that individual devices can be found automatically and in an unsupervised manner using a local-shape-based approach

    HER2-enriched subtype and novel molecular subgroups drive aromatase inhibitor resistance and an increased risk of relapse in early ER+/HER2+ breast cancer

    Get PDF
    BACKGROUND: Oestrogen receptor positive/ human epidermal growth factor receptor positive (ER+/HER2+) breast cancers (BCs) are less responsive to endocrine therapy than ER+/HER2- tumours. Mechanisms underpinning the differential behaviour of ER+HER2+ tumours are poorly characterised. Our aim was to identify biomarkers of response to 2 weeks’ presurgical AI treatment in ER+/HER2+ BCs. METHODS: All available ER+/HER2+ BC baseline tumours (n=342) in the POETIC trial were gene expression profiled using BC360™ (NanoString) covering intrinsic subtypes and 46 key biological signatures. Early response to AI was assessed by changes in Ki67 expression and residual Ki67 at 2 weeks (Ki672wk). Time-To-Recurrence (TTR) was estimated using Kaplan-Meier methods and Cox models adjusted for standard clinicopathological variables. New molecular subgroups (MS) were identified using consensus clustering. FINDINGS: HER2-enriched (HER2-E) subtype BCs (44.7% of the total) showed poorer Ki67 response and higher Ki672wk (p<0.0001) than non-HER2-E BCs. High expression of ERBB2 expression, homologous recombination deficiency (HRD) and TP53 mutational score were associated with poor response and immune-related signatures with High Ki672wk. Five new MS that were associated with differential response to AI were identified. HER2-E had significantly poorer TTR compared to Luminal BCs (HR 2.55, 95% CI 1.14–5.69; p=0.0222). The new MS were independent predictors of TTR, adding significant value beyond intrinsic subtypes. INTERPRETATION: Our results show HER2-E as a standardised biomarker associated with poor response to AI and worse outcome in ER+/HER2+. HRD, TP53 mutational score and immune-tumour tolerance are predictive biomarkers for poor response to AI. Lastly, novel MS identify additional non-HER2-E tumours not responding to AI with an increased risk of relapse

    Multiple novel prostate cancer susceptibility signals identified by fine-mapping of known risk loci among Europeans

    Get PDF
    Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in 25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16 regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP, while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium (LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region. Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the refined data for existing association signals, we estimate that these loci now explain ∼38.9% of the familial relative risk of PrCa, an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent signals within the same regio

    Mutations in the histone methyltransferase gene KMT2B cause complex early-onset dystonia.

    Get PDF
    Histone lysine methylation, mediated by mixed-lineage leukemia (MLL) proteins, is now known to be critical in the regulation of gene expression, genomic stability, cell cycle and nuclear architecture. Despite MLL proteins being postulated as essential for normal development, little is known about the specific functions of the different MLL lysine methyltransferases. Here we report heterozygous variants in the gene KMT2B (also known as MLL4) in 27 unrelated individuals with a complex progressive childhood-onset dystonia, often associated with a typical facial appearance and characteristic brain magnetic resonance imaging findings. Over time, the majority of affected individuals developed prominent cervical, cranial and laryngeal dystonia. Marked clinical benefit, including the restoration of independent ambulation in some cases, was observed following deep brain stimulation (DBS). These findings highlight a clinically recognizable and potentially treatable form of genetic dystonia, demonstrating the crucial role of KMT2B in the physiological control of voluntary movement.Funding for the project was provided by the Wellcome Trust for UK10K (WT091310) and DDD Study. The DDD study presents independent research commissioned by the Health Innovation Challenge Fund [grant number HICF-1009-003] - see www.ddduk.org/access.html for full acknowledgement. This work was supported in part by the Intramural Research Program of the National Human Genome Research Institute and the Common Fund, NIH Office of the Director. This work was supported in part by the German Ministry of Research and Education (grant nos. 01GS08160 and 01GS08167; German Mental Retardation Network) as part of the National Genome Research Network to A.R. and D.W. and by the Deutsche Forschungsgemeinschaft (AB393/2-2) to A.R. Brain expression data was provided by the UK Human Brain Expression Consortium (UKBEC), which comprises John A. Hardy, Mina Ryten, Michael Weale, Daniah Trabzuni, Adaikalavan Ramasamy, Colin Smith and Robert Walker, affiliated with UCL Institute of Neurology (J.H., M.R., D.T.), King’s College London (M.R., M.W., A.R.) and the University of Edinburgh (C.S., R.W.)

    Breast cancer risk variants at 6q25 display different phenotype associations and regulate ESR1, RMND1 and CCDC170.

    Get PDF
    We analyzed 3,872 common genetic variants across the ESR1 locus (encoding estrogen receptor α) in 118,816 subjects from three international consortia. We found evidence for at least five independent causal variants, each associated with different phenotype sets, including estrogen receptor (ER(+) or ER(-)) and human ERBB2 (HER2(+) or HER2(-)) tumor subtypes, mammographic density and tumor grade. The best candidate causal variants for ER(-) tumors lie in four separate enhancer elements, and their risk alleles reduce expression of ESR1, RMND1 and CCDC170, whereas the risk alleles of the strongest candidates for the remaining independent causal variant disrupt a silencer element and putatively increase ESR1 and RMND1 expression.This is the author accepted manuscript. The final version is available from Nature Publishing Group via http://dx.doi.org/10.1038/ng.352
    corecore